Jupiter
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Alam, Md Tanvirul, Bhusal, Dipkamal, Ahmad, Salman, Rastogi, Nidhi, Worth, Peter
Large Language Models (LLMs) have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence (CTI) remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could substantially reduce analyst workload. CTIBench introduced a comprehensive benchmark for evaluating LLMs across multiple CTI tasks. In this work, we extend CTIBench by developing AthenaBench, an enhanced benchmark that includes an improved dataset creation pipeline, duplicate removal, refined evaluation metrics, and a new task focused on risk mitigation strategies. We evaluate twelve LLMs, including state-of-the-art proprietary models such as GPT-5 and Gemini-2.5 Pro, alongside seven open-source models from the LLaMA and Qwen families. While proprietary LLMs achieve stronger results overall, their performance remains subpar on reasoning-intensive tasks, such as threat actor attribution and risk mitigation, with open-source models trailing even further behind. These findings highlight fundamental limitations in the reasoning capabilities of current LLMs and underscore the need for models explicitly tailored to CTI workflows and automation.
- North America > United States > New York > Monroe County > Rochester (0.04)
- North America > United States > Florida > Palm Beach County > Jupiter (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
Cavanagh, Joseph M., Sun, Kunyang, Gritsevskiy, Andrew, Bagni, Dorian, Bannister, Thomas D., Head-Gordon, Teresa
Here we show that a Large Language Model (LLM) can serve as a foundation model for a Chemical Language Model (CLM) which performs at or above the level of CLMs trained solely on chemical SMILES string data. Using supervised fine-tuning (SFT) and direct preference optimization (DPO) on the open-source Llama LLM, we demonstrate that we can train an LLM to respond to prompts such as generating molecules with properties of interest to drug development. This overall framework allows an LLM to not just be a chatbot client for chemistry and materials tasks, but can be adapted to speak more directly as a CLM which can generate molecules with user-specified properties.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Florida > Palm Beach County > Jupiter (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
The Use of a Large Language Model for Cyberbullying Detection
Ogunleye, Bayode, Dharmaraj, Babitha
The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in todays cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (17 more...)
Robust Meta-Model for Predicting the Need for Blood Transfusion in Non-traumatic ICU Patients
Rafiei, Alireza, Moore, Ronald, Choudhary, Tilendra, Marshall, Curtis, Smith, Geoffrey, Roback, John D., Patel, Ravi M., Josephson, Cassandra D., Kamaleswaran, Rishikesan
Objective: Blood transfusions, crucial in managing anemia and coagulopathy in ICU settings, require accurate prediction for effective resource allocation and patient risk assessment. However, existing clinical decision support systems have primarily targeted a particular patient demographic with unique medical conditions and focused on a single type of blood transfusion. This study aims to develop an advanced machine learning-based model to predict the probability of transfusion necessity over the next 24 hours for a diverse range of non-traumatic ICU patients. Methods: We conducted a retrospective cohort study on 72,072 adult non-traumatic ICU patients admitted to a high-volume US metropolitan academic hospital between 2016 and 2020. We developed a meta-learner and various machine learning models to serve as predictors, training them annually with four-year data and evaluating on the fifth, unseen year, iteratively over five years. Results: The experimental results revealed that the meta-model surpasses the other models in different development scenarios. It achieved notable performance metrics, including an Area Under the Receiver Operating Characteristic (AUROC) curve of 0.97, an accuracy rate of 0.93, and an F1-score of 0.89 in the best scenario. Conclusion: This study pioneers the use of machine learning models for predicting blood transfusion needs in a diverse cohort of critically ill patients. The findings of this evaluation confirm that our model not only predicts transfusion requirements effectively but also identifies key biomarkers for making transfusion decisions.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > Florida > Palm Beach County > Jupiter (0.04)
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhang, Zhongping, Gu, Yiwen, Plummer, Bryan A.
Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained language models. However, in many applications, such as understanding news stories, these articles are based on real-world events and may reference many named entities that are difficult to accurately recognize and predict by language models. To address this challenge, we propose an ENtity-aware article GeneratIoN and rEtrieval (ENGINE) framework, to explicitly incorporate named entities into language models. ENGINE has two main components: a named-entity extraction module to extract named entities from both metadata and embedded images associated with articles, and an entity-aware mechanism that enhances the model's ability to recognize and predict entity names. We conducted experiments on three public datasets: GoodNews, VisualNews, and WikiText, where our results demonstrate that our model can boost both article generation and article retrieval performance, with a 4-5 perplexity improvement in article generation and a 3-4% boost in recall@1 in article retrieval. We release our implementation at https://github.com/Zhongping-Zhang/ENGINE .
- North America > The Bahamas (0.14)
- North America > United States > Texas (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- (17 more...)
- Media > Television (1.00)
- Media > Film (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (7 more...)
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Varshney, Neeraj, Yao, Wenlin, Zhang, Hongming, Chen, Jianshu, Yu, Dong
Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Hawaii (0.04)
- (23 more...)
- Research Report (1.00)
- Workflow (0.66)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Robots? Some Companies Find Only Humans Can Do the Job
Companies have been trying out automatons to serve food in restaurants, make home deliveries or do chores in stores, partly in hopes of easing the worker shortage. Among the disenchanted, FedEx Corp. said last month it was powering down Roxo, its last-mile delivery robot, to prioritize several "nearer-term opportunities," a spokeswoman said. Inc. said it was ending field tests of Scout, its home-delivery robot, after learning that some aspects of its "unique delivery experience" weren't "meeting customers' needs," a company spokeswoman said. And over the summer, DoorDash Inc. said it was shutting down its Chowbotics business -- best known for Sally, the salad-making robot -- roughly 18 months after buying it. "While we gained valuable insights into how to better serve this market, we concluded our current approach was not meeting our very high thresholds for continued investment," a DoorDash spokesman said.
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- North America > United States > Nevada > Clark County > Henderson (0.05)
- (3 more...)
Robots? Some Companies Find Only Humans Can Do the Job
Among the disenchanted, FedEx Corp. said last month it was powering down Roxo, its last-mile delivery robot, to prioritize several "nearer-term opportunities," a spokeswoman said. Inc. said it was ending field tests of Scout, its home-delivery robot, after learning that some aspects of its "unique delivery experience" weren't "meeting customers' needs," a company spokeswoman said. And over the summer, DoorDash Inc. said it was shutting down its Chowbotics business--best known for Sally, the salad-making robot--roughly 18 months after buying it. "While we gained valuable insights into how to better serve this market, we concluded our current approach was not meeting our very high thresholds for continued investment," a DoorDash spokesman said. Companies have entertained hopes that the growing variety of robots could help them not only weather the worker shortage, but speed up labor-intensive tasks, improve customer service by reducing the number of things the human workers have to do, and as an added bonus, position their brands as innovative and forward-leaning.
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- North America > United States > Nevada > Clark County > Henderson (0.05)
- (3 more...)
An Upcoming Segment of Advancements will Explore Developments in Artif
An upcoming segment of Advancements with Ted Danson will discover how developments in technology are making artificial intelligence universally accessible. Viewers will learn how the technology is disrupting the Cloud and Internet of Things market, enabling businesses to supercharge their intelligence layer effortlessly, and allowing users to seamlessly embed and run AI-powered applications on any device, from anywhere in the world. Audiences will learn how SliceX AI is building a next-generation intelligence engine that makes AI fast, cost-effective, and easy to train and deploy across devices and use-cases. The show will explore how the proprietary technology's built-in privacy and breakthrough performance allows businesses of all sizes and developers to get their own slice of next-generation AI and apply it to any vertical such as retail, finance, customer support, health and many more. "Over the past few years, we have witnessed an explosive growth in software-driven automation via AI and the creation of advanced intelligent systems However, given the staggering amount of computing power and cost required to develop and deploy cutting-edge AI in production, it is no surprise that most of the recent progress in AI has been driven by and limited to a handful of large companies that have access to plenty of resources at their disposal," said Sujith Ravi, Founder & CEO of SliceX AI.
Learn how Artificial Intelligence is Improving the Healthcare Experience
JUPITER, Fla., Jan. 6, 2022 /PRNewswire-PRWeb/ -- Scheduled to broadcast spring/2022, the award-winning series, Advancements with Ted Danson, will discover how innovations in AI are helping employees to access, understand, and utilize their health benefits. In this segment, Advancements will explore why so many Americans lack an understanding about their healthcare benefits -- from complex rules to complicated, verbose verbiage. Viewers will learn about the many ways these complexities can negatively impact employees' health and well-being, productivity in the workplace, and ultimately, the U.S. workforce as a whole. Audiences will hear from experts at Insurights, an AI-powered startup on a mission to improve human health by giving people better access to their health benefits. The show will discover how developments in AI and technology present a solution for the industry as the Insurights team introduces Zoe, its digital healthcare navigator.
- North America > United States > Florida > Palm Beach County > Jupiter (0.26)
- Asia > Middle East > Israel (0.08)
- North America > United States > New York (0.06)
- Press Release (0.92)
- Research Report > Promising Solution (0.37)
- Health & Medicine > Consumer Health (0.94)
- Health & Medicine > Health Care Technology (0.57)